Reliable OCR solution for digital content re-mastering
نویسنده
چکیده
This paper addresses the system’s aspects of OCR solutions in the context of digital content re-mastering. It analyzes the unique requirements and challenges to implement a reliable OCR system in a high-volume and unattended environment. A new reliability metric is proposed and a practical solution based on the combination of multiple commercial OCR engines is introduced. Experimental results show that the combination system is both much more accurate and more reliable when compared with individual engines, thus it can fully satisfy the need of digital content re-mastering applications.
منابع مشابه
Automatic document navigation for digital content remastering
digital content re-mastering, document structure analysis, print on demand, content linking, OCR This paper presents a novel method of automatically adding navigation capabilities to re-mastered electronic books. We first analyze the need for a generic and robust system to automatically construct navigation links into re-mastered books. We then introduce the core algorithm based on text matchin...
متن کاملDRR Research beyond COTS OCR Software: A Survey
After decades of research, Optical Character Recognition (OCR) has entered into a relatively mature stage. Commercial off-the-shelf (COTS) OCR software packages have become powerful tools in Document Recognition and Retrieval (DRR) applications. One question naturally arises: What areas are left for new DRR research beyond COTS OCR software? There are many discussions around it in recent confer...
متن کاملOCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set
Since the dawn of the computing era, information has been represented digitally so that it can be processed by electronic computers. Paper books and documents were abundant and widely being published at that time; and hence, there was a need to convert them into digital format. OCR, short for Optical Character Recognition was conceived to translate paper-based books into digital e-books. Regret...
متن کاملDigital Storytelling in a Foreign Language Classroom of Higher Educational Establishments
The conceptual idea of the paper is that the use of digital biographical narratives in a foreign language classroom creates favorable conditions for the harmonious development and creative, cognitive, communicative and technological skills of students. The paper deals with the methods of teaching students to create digital biographical narratives about the life of outstanding personalities whic...
متن کاملBoosting OCR Accuracy Using Crowdsourcing
Book digitizing is an important work in preserving ancient heritages. However, digitizing books contains a series of labor-intensive works, and one of them is to verify optical character recognition (OCR) outcomes. In this paper, we propose a crowdsourceable OCR verification method. Using our method, content holders are able to leverage the power of crowds to complete verification tasks and avo...
متن کامل